DMA engine for fetching words in reverse bit order

ABSTRACT

Presented herein is a direct memory access engine for providing data words in reverse order. The data words are fetched in batches comprising a predetermined number of data words starting from the last data word and proceeding to the first data word. The batches are stored in a local buffer. The contents of the local buffer are transmitted in reverse order. A set of multiplexers reverses the bit positions of the words in the local buffer.

RELATED APPLICATIONS

This application claims priority to Provisional Application for U.S.Patent, Ser. No. 60/494,666, (Attorney Docket Number 15137US01) entitled“DMA Engine for Fetching Words in Reverse Order”, filed Aug. 13, 2003,and Provisional Application for U.S. Patent, Ser. No. 60/494,746,(Attorney Docket Number 15138US01) entitled “DMA Engine for FetchingWords in Reverse Order”, filed Aug. 13, 2003, which are incorporatedherein by reference for all purposes.

This application is also related to Application for U.S. Patent, SerialNo. ______, entitled “DMA Engine for Fetching Words in Reverse Order”,filed ______.

FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT

[Not Applicable]

[MICROFICHE/COPYRIGHT REFERENCE]

[Not Applicable]

BACKGROUND OF THE INVENTION

The MPEG-2 standard uses video packets comprising any number ofmacroblocks to confine bit errors. Frames are represented by a set ofmacroblocks. The macroblocks are grouped into a data structure known asa video packet. In MPEG-2, all macroblock rows start with a new videopacket. During the decoding of a video packet, if an error isencountered, the video decoder can simply drop the remaining macroblocksthat followed the bit error. In this manner, only a small amount ofinformation is lost as a result of the bit error.

MPEG-4 Part 2 also uses video packets to confine bit errors. The videodata is transmitted as a video elementary stream. The portions of thevideo elementary stream that are at the video packet level and lower areencoded with a variable length code. In MPEG-4 Part 2, the video packetis defined such that the video packet can be decoded in a forward order,or the reverse order. Accordingly, after encountering an error, thevideo decoder can go to the end of the video packet, and start decodingin the reverse order, until the same error, or another error isencountered. In this manner, a greater portion of the video packet isrecovered and reconstructed, in spite of encountering error(s).

To take advantage of the foregoing feature, the video decoder needs tobe able to receive and decode the video bitstream in reverse order inreal-time. During the decoding, the video elementary stream is stored ina memory known as the compressed data buffer in the forward order, alongwith a table that indicates the starting addresses of each video packet.The video decoder receives and decodes the video elementary stream byaccessing the compressed data buffer. Upon encountering an error, thevideo decoder can receive the video packet at the ending address of thevideo packet and moving in the reverse order.

Receiving the video packet in reverse order can be made possible bymanipulating the memory access. For example, the video decoder cansequentially access data words in reverse order. After accessing eachdata word, the video decoder can use logic to reverse the bit order ofthe data word. However, the foregoing adds significant operations to thevideo decoder and makes accessing and decoding in the reverse orderdifficult to perform in real-time.

Further limitations and disadvantages of conventional and traditionalsystems will become apparent to one of skill in the art throughcomparison of such systems with the invention as set forth in theremainder of the present application with reference to the drawings.

BRIEF SUMMARY OF THE INVENTION

Presented herein is a direct memory access engine for fetching words inreverse order. In one embodiment, there is presented a method forproviding a plurality of sequential data words. The method includesreceiving a command to provide the plurality of sequential data words,wherein the plurality of sequential data words comprises a first dataword and a last data word, and one or more data words between the firstdata word and the last data word, fetching a sequential portion of thesequential data words, said sequential portion comprising a firstintermediate word, the last word, and one or more data words between theintermediate word and the last word, storing the sequential portion,transmitting at least a portion of the last data word, and transmittingat least a portion of the intermediate data words after transmitting atleast the portion of the last data word.

In another embodiment, there is presented a system for providing aplurality of sequential data words. The system comprises a state logicmachine, a memory controller, a local buffer, and a port. The statelogic machine receives a command to provide the plurality of sequentialof sequential data words, wherein the plurality of sequential data wordscomprises a first data word and a last data word, and one or more datawords between the first data word and the last data word. The memorycontroller fetches a sequential portion of the sequential data words,said sequential portion comprising a first intermediate word, the lastword, and one or more data words between the intermediate word and thelast word. The local buffer stores the sequential portion. The porttransmits at least a portion of the last data word and transmits atleast a portion of the intermediate data words after transmitting atleast the portion of the last data word.

In another embodiment, there is presented a system for decoding a videopacket. The system comprises a compressed data buffer, a video decoder,and a direct memory access engine. The compressed data buffer comprisesa plurality of sequential data words. The plurality of sequential datawords store a video packet. The video decoder decodes the video packet.The direct memory access engine provides the video packet to the videodecoder and comprises a state logic machine, a memory controller, alocal buffer, and a port. The state logic machine receives a command toprovide the plurality of sequential data words and a control signalindicating reverse order from the video decoder, wherein the pluralityof sequential data words comprises a first data word and a last dataword, and one or more data words between the first data word and thelast data word. The memory controller fetches a sequential portion ofthe sequential data words, said sequential portion comprising a firstintermediate word, the last word, and one or more data words between theintermediate word and the last word. The local buffer stores thesequential portion. The port transmits at least a portion of the lastdata word and transmits at least a portion of the intermediate datawords after transmitting at least the portion of the last data word.

These and other advantages and novel features of the present invention,as well as details of illustrated embodiments thereof, will be morefully understood from he following description and drawings.

BRIEF DESCRIPTION OF SEVERAL VIEWS OF THE DRAWINGS

FIG. 1 is a block diagram describing the encoding of video data;

FIG. 2 is a block diagram describing an exemplary decoder in accordancewith an embodiment of the present invention;

FIG. 3 is a block diagram of an exemplary compressed data buffer;

FIG. 4 is a block diagram of an exemplary direct memory access engine inaccordance with an embodiment of the present invention; and

FIG. 5 is a flow diagram for accessing data words in reverse order inaccordance with an embodiment of the present invention.

DETAILED DESCRIPTION OF THE INVENTION

Referring now to FIG. 1, there is illustrated a block diagram describingMPEG formatting of video data 305. The video data 305 comprises a seriesof frames 310. Each frame comprises two dimensional grids of luminanceY, chroma red Cr, and chroma blue Cb pixels 315. The two-dimensionalgrids are divided into 8×8 blocks 335, where four blocks 335 ofluminance pixels Y are associated with a block 335 of chroma red Cr, anda block 335 of chroma blue Cb pixels. The four blocks of luminancepixels Y, the block of chroma red Cr, and the chroma blue Cb form a datastructure known as a macroblock 337. The macroblock 337 also includesadditional parameters, including motion vectors.

The macroblocks 337 representing a frame are grouped into differentvideo packets 340. The video packet 340 includes the macroblocks 337 inthe video packet 340, as well as additional parameters describing thevideo packet. Each of the video packets 340 forming the frame form thedata portion of a picture structure 345. The picture 345 includes thevideo packets 340 as well as additional parameters. The pictures arethen grouped together as a group of pictures 350. The group of pictures350 also includes additional parameters. Groups of pictures 350 are thenstored, forming what is known as a video elementary stream 355. Thevideo elementary stream 355 is then packetized to form a packetizedelementary sequence 360. Each packet is then associated with a transportheader 365 a, forming what are known as transport packets 365 b.

The transport packets 365 b can be multiplexed with other transportpackets 365 b carrying other content, such as another video elementarystream 355 or an audio elementary stream. The multiplexed transportpackets from what is known as a transport stream. The transport streamis transmitted over a communication medium for decoding andpresentation.

Referring now to FIG. 2, there is illustrated a block diagram of anexemplary decoder for decoding compressed video data, configured inaccordance with an embodiment of the present invention. A processor,that may include a CPU 490, reads a stream of transport packets 365 b (atransport stream) into a transport stream buffer 432 within an SDRAM430.

The data is output from the transport stream presentation buffer 432 andis then passed to a data transport processor 435. The data transportprocessor then demultiplexes the MPEG transport stream into its PESconstituents and passes the audio transport stream to an audio decoder460 and the video transport stream to a video transport processor 440.

The video transport processor 440 converts the video transport streaminto a video elementary stream and provides the video elementary streamto an MPEG video decoder 445 that decodes the video. The videoelementary stream 355 is stored in a compressed data buffer (CDB) 447.The MPEG video decoder 445 accesses the compressed data buffer (CDB) toreceive the video elementary stream 355. The video elementary stream 355is decoded by the MPEG video decoder 445 resulting in the reconstructedvideo data 305.

The audio data is sent to the output blocks and the video data 305 issent to a display engine 450. The display engine 450 is responsible forand operable to scale the video picture, render the graphics, andconstruct the complete display among other functions. Once the displayis ready to be presented, it is passed to a video encoder 455 where itis converted to analog video using an internal digital to analogconverter (DAC). The digital audio is converted to analog in the audiodigital to analog converter (DAC) 465.

Referring now to FIG. 3, there is illustrated a block diagram describingan exemplary compressed data buffer 447. The compressed data buffer 447includes any number of data words 505(1) . . . 505(m). The data wordscan have any width. In an exemplary case, for example, the data wordscan comprise 256 bit jumbo words (words).

The compressed data buffer 447 stores the video elementary stream 355.The video elementary stream 355 comprises any number of video packets340. The video packets 340 further comprise a video packet header andany number of macroblocks 337. The compressed data buffer 447 alsostores a start code table 507. The start code table 507 associates eachvideo packet 340 with its starting address in the compressed data buffer447. Alternatively the video packet 340 can be first scanned forward,without evacuating data from memory, then number of bytes/bits can becounted, and then returned using the DMA engine.

The MPEG video decoder 445 receives the video packets 340 from the videoelementary stream 355 and decodes the video packets 340. The videopacket 340 is received and decoded by the MPEG video decoder 445starting from the word 505(x) storing the beginning of the video packet340, and proceeding to the word 505(n) storing the end of the videopacket 340.

A direct memory access (DMA) engine 510 facilitates receipt of the videopackets 340 by the MPEG video decoder 445. Alternatively, a processorcan facilitate receipt of the video packets 340. Accordingly, DMA engine510 shall be interpreted to also include a processor that is operable tofetch video packets from memory. The MPEG video decoder 445 receives avideo packet 340 by looking up the starting address and the endingaddress of a video packet 340 in the start code table 507. The MPEGvideo decoder 445 can then command the DMA engine 510 to fetch the words505(x) . . . 505(n) that store the video packet 340. Responsive thereto,the DMA engine 510 fetches and provides the words 505(x) . . . 505(n)that store the video packet 340.

The DMA engine 510 provides the words 505(x) . . . 505(n) to anextractor 515 within the MPEG video decoder 445 in a serial manner,beginning with word 505(x) and proceeding to the last word 505(n). TheMPEG decoder 445 decodes the video packet 340, in a serial manner,beginning decoding with the first word 505(x) and proceeding to the lastword 505(n). The extractor 515 and the DMA engine 510 operate inconjunction with each other, such that the words 505 are provided to theMPEG video decoder 445 at a dynamic rate that is in substantialrelationship to the rate that the MPEG video decoder 445 is decoding thewords 505.

In MPEG-4 PART 2, the video packet 340 is defined such that the videopacket 340 can be decoded in a forward order, or the reverse order.Accordingly, if the MPEG video decoder 445 encounters an error, thevideo decoder can go to the end of the video packet 340, and startdecoding in the reverse order. For example, if the MPEG video decoder445 decodes the video packet 340 beginning with the first word 505(x),and encounters an error in word 505(x+5), the MPEG video decoder 445 canstart decoding the video packet 340 from word 505(n) and decode in thereverse order, e.g., word 505(n−1), 505(n−2) . . . , etc.

Upon detecting an error, the MPEG video decoder 445 transmits a commandto the DMA engine 510 to fetch the words storing the video packet 340,e.g., words 510(x) . . . 510(n), along with a reverse order signal.Responsive thereto, the DMA engine 510 provides the words 510(x) . . .510(n) in the reverse order to the MPEG video decoder 445.

Referring now to FIG. 4, there is illustrated a block diagram describingan exemplary DMA engine 510 in accordance with an embodiment of thepresent invention. The DMA engine 510 comprises a state logic machine605, a local buffer 610, and a memory controller 620. The local buffer610 can comprise any amount of memory with any width of data words. Forexample, in an exemplary case, the memory can comprise 128 32-bit words611(0) . . . 611(127).

The state logic machine 605 receives a command to fetch data words in anaddress range, e.g., 505(x)-510(n) from the MPEG video decoder 445. Thecommand can be accompanied by a control signal indicating that the datawords in the address range are to be provided to the MPEG video decoder445 in the reverse order, e.g., 505(n), 505(n−1) . . . 505(x).

Responsive to receiving a command to fetch the data words 510(x)-510(n)in the reverse order, the state logic machine 605 commands the memorycontroller 620 to retrieve a batch comprising a the last predeterminednumber of data words 505 in the provided address range, and store thepredetermined number of data words in the local buffer 610. Thepredetermined number of data words in the batch is less than or equal tothe capacity of the local buffer 610. For example, in an exemplaryembodiment, where the local buffer 610 comprises 128 32-bit words, thebatch of data words 505 can include the last 16 words in the providedaddress range, e.g., data words 505(n−15) . . . 505(n).

After the batch of data words 505(n−15) . . . 505(n) is stored in thelocal buffer 610, the state logic machine 605 causes the contents of thelocal buffer 610 to be provided to the MPEG video decoder 445 beginningwith word 611(127), and proceeding sequentially to word 611(0). Afterthe contents of the local buffer 610, e.g., words 611(127) . . . 611(0),are provided to the MPEG video decoder 445, the state logic machine 605commands the memory controller 620 to fetch another batch comprising thepredetermined number of words, e.g., data words 505(n−31) . . .505(n−16), that precede the most recently fetched data words, e.g., datawords 505(n−15) . . . 505(n). The data words 505(n−15) . . . 505(n) arestored in the local buffer 610 and provided to the MPEG decoder 445.

The foregoing is repeated until the next predetermined number of datawords comprises the first data word in the address range, e.g., dataword 505(x). Where a batch comprises the first data word in the addressrange, e.g., data word 505(x), the state logic machine 605 truncatesthat portion of the predetermined number of data words that precedes thefirst data word 505(x), and commands the memory controller 620 to fetchthe truncated batch comprising the first data word 505(x) and all datawords 505(x+1), 505(x+2) . . . , following the first data word 505(x)that have not been previously transmitted to the MPEG video decoder 445.

The foregoing provides the data words 505(x) . . . 505(n) as a set of 32bit words starting from the last portion of 505(n) and proceedingsequentially to the first portion of 505(x). The bits forming the 32-bitwords can be reversed with respect to one another, in any number ofways. For example, the MPEG video decoder 445 can include logic thatreverses the 32 bits of each word. Alternatively, the DMA engine 510 caninclude additional circuitry that causes the 32 bits of each word 611 tobe provided to the MPEG video decoder 445 in the reverse order.

Referring now to FIG. 5, there is illustrated a flow diagram forproviding a video packet in a reverse order. At 705, the state logicmachine 605 receives a command to fetch data words in an address range,e.g., 505(x)-510(n) from the MPEG video decoder 445, accompanied by acontrol signal indicating that the data words in the address range areto be provided to the MPEG video decoder 445 in the reverse order, e.g.,505(n), 505(n−1) . . . 505(x).

Responsive to receiving the command, the state logic machine 605determines (706) if a predetermined number of words comprises the firstword, 505(x). Where the predetermined number of words comprises thefirst data word in the address range, e.g., data word 505(x), the statelogic machine 605 truncates that portion of the predetermined number ofdata words that precedes the first data word 505(x) and commands (708)the memory controller 620 to fetch (709) the truncated batch comprisingthe first data word 505(x) and all data words 505(x+1), 505(x+2) . . .following the first data word 505(x) that have not been previouslytransmitted to the MPEG video decoder 445. Where during 706, the statelogic machine 605 determines that the predetermined number of words doesnot comprise the first word, 505(x), the state logic machine 605commands (710) the memory controller 620 to fetch (715) a batchcomprising the last predetermined number of data words 505 in theprovided address range.

The fetched data words 505 are stored (720) in the local buffer 610.After the data words are stored in the local buffer 610, the state logicmachine 605 causes the contents of the local buffer 610 to be provided(725) to the MPEG video decoder 445 beginning with word 611(127), andproceeding sequentially to word 611(0). After the contents of the localbuffer 610, e.g., words 611(127) . . . 611(0), are provided to the MPEGvideo decoder 445, a determination (730) is made whether the first dataword, data word 505(x) has been provided to the MPEG video decoder 445.If the first data word has not been provided to the MPEG video decoderat 730, 705-730 are repeated. If the first data word has been providedto the MPEG video decoder at 730, the process is completed.

One embodiment of the present invention may be implemented as a boardlevel product, as a single chip, application specific integrated circuit(ASIC), or with varying levels integrated on a single chip with otherportions of the system as separate components. The degree of integrationof the system will primarily be determined by speed and costconsiderations. Because of the sophisticated nature of modernprocessors, it is possible to utilize a commercially availableprocessor, which may be implemented external to an ASIC implementationof the present system. Alternatively, if the processor is available asan ASIC core or logic block, then the commercially available processorcan be implemented as part of an ASIC device with various functionsimplemented as firmware.

While the invention has been described with reference to certainembodiments, it will be understood by those skilled in the art thatvarious changes may be made and equivalents may be substituted withoutdeparting from the scope of the invention. In addition, manymodifications may be made to adapt particular situation or material tothe teachings of the invention without departing from its scope.Therefore, it is intended that the invention not be limited to theparticular embodiment(s) disclosed, but that the invention will includeall embodiments falling within the scope of the appended claims.

1. A method for providing a plurality of sequential data words, saidmethod comprising: receiving a command to provide the plurality ofsequential data words, wherein the plurality of sequential data wordscomprises a first data word and a last data word, and one or more datawords between the first data word and the last data word; fetching asequential portion of the sequential data words, said sequential portioncomprising a first intermediate word, the last word, and one or moredata words between the intermediate word and the last word; storing thesequential portion; transmitting at least a portion of the last dataword in reverse bit position order; and transmitting at least a portionof the intermediate data words after transmitting at least the portionof the last data word in reverse bit position order.
 2. The method ofclaim 1, further comprising: fetching another sequential portion of thesequential data words, the another sequential portion comprising asecond intermediate data word, immediately followed by one or more datawords, immediately followed by a third intermediate data word, the thirdintermediate data word immediately preceding the first intermediateword; storing the another sequential portion; transmitting at least aportion of the third intermediate word in reverse bit position order;and transmitting at least a portion of the second intermediate wordafter transmitting at least the portion of the third intermediate wordin reverse bit position order.
 3. The method of claim 1, wherein storingfurther comprises: storing the sequential portion in a memory, thememory having a beginning address and an ending address, and wherein atleast the portion of the last data word is stored at the ending addressand wherein at least the portion of the first intermediate word isstored in the beginning address.
 4. The method of claim 3, wherein thememory is characterized by a width, and the data words are characterizedby a width, the width of the memory being smaller than the width of thedata words.
 5. The method of claim 3, wherein the last data wordcomprises at least the portion of the last data word and at leastanother portion, wherein at least the portion comprises the leastsignificant bits of the last data word, and wherein the at least anotherportion comprises the most significant bits of the last data word, andwherein storing the portion further comprises: storing the at leastanother portion of the last data word at an address preceding the endingaddress.
 6. The method of claim 5, further comprising: transmitting theat least another portion of the last word in reverse bit position orderafter transmitting at least the portion of the last word in reverse bitposition order.
 7. The method of claim 1, wherein the one or more datawords comprise a predetermined number of data words.
 8. The method ofclaim 1, wherein the plurality of sequential data words stores a slicegroup.
 9. A system for providing a plurality of sequential data words,said method comprising: a state logic machine for receiving a command toprovide the plurality of sequential of sequential data words, theplurality of sequential data words comprises a first data word and alast data word, and one or more data words between the first data wordand the last data word; a memory controller for fetching a sequentialportion of the sequential data words, said sequential portion comprisinga first intermediate word, the last word, and one or more data wordsbetween the intermediate word and the last word; a local buffer forstoring the sequential portion; and a plurality of multiplexers forreversing bit positions of at least a portion of the last data word andreversing bit positions of at least a portion of the intermediate dataword; a port for transmitting at least a portion of the last data wordin the reverse bit position order and transmitting at least a portion ofthe intermediate data word in reverse bit position order aftertransmitting at least the portion of the last data word.
 10. The systemof claim 9, wherein: the memory controller fetches another sequentialportion of the sequential data words, the another sequential portioncomprising a second intermediate data word, immediately followed by oneor more data words, immediately followed by a third intermediate dataword, the third intermediate data word immediately preceding the firstintermediate word; the local buffer stores the another sequentialportion; and the port transmits at least a portion of the thirdintermediate word and transmits at least a portion of the secondintermediate word after transmitting at least the portion of the thirdintermediate word.
 11. The system of claim 9, wherein the local bufferis associated with a beginning address and an ending address, andwherein a memory location at the ending address stores at least theportion of the last data word and wherein a memory location at thebeginning address stores at least the portion of the first intermediateword.
 12. The system of claim 11, wherein the local buffer ischaracterized by a width, and the data words are characterized by awidth, the width of the local buffer being smaller than the width of thedata words.
 13. The system of claim 11, wherein the last data wordcomprises at least the portion of the last data word and at leastanother portion, wherein at least the portion comprises the leastsignificant bits of the last data word, and wherein the at least anotherportion comprises the most significant bits of the last data word, andwherein a memory location at an address preceding the ending addressstores the at least another portion of the last data word.
 14. Thesystem of claim 13, wherein the port transmits the at least anotherportion of the last word after transmitting at least the portion of thelast word.
 15. The system of claim 9, wherein the one or more data wordscomprise a predetermined number of data words.
 16. The system of claim9, wherein the plurality of sequential data words stores a slice group.17. A system for decoding a slice group, said system comprising: acompressed data buffer comprising a plurality of sequential data words,the plurality of sequential data words for storing a slice group; avideo decoder for decoding the slice group; and a direct memory accessengine for providing the slice group to the video decoder, the directmemory access engine comprising: a state logic machine for receiving acommand to provide the plurality of sequential data words and a controlsignal indicating reverse order from the video decoder, the plurality ofsequential data words comprises a first data word and a last data word,and one or more data words between the first data word and the last dataword; a memory controller for fetching a sequential portion of thesequential data words, said sequential portion comprising a firstintermediate word, the last word, and one or more data words between theintermediate word and the last word; a local buffer for storing thesequential portion; a plurality of multiplexers for reversing the bitpositions of the first intermediate word and the last data word; and aport for transmitting at least a portion of the last data word inreverse bit position order and transmitting at least a portion of theintermediate data words in reverse bit position order after transmittingat least the portion of the last data word.